[Draft] feat(btree): Intro b-tree global index and add tests for java compatibility.#212
[Draft] feat(btree): Intro b-tree global index and add tests for java compatibility.#212ChaomingZhangCN wants to merge 35 commits intoalibaba:mainfrom
Conversation
# 请输入一个提交信息以解释此合并的必要性,尤其是将一个更新后的上游分支 # 合并到主题分支。 # # 以 '#' 开始的行将被忽略,而空的提交说明将终止提交。
- Add BtreeGlobalIndexWriter for writing btree global index files - Fix AllNonNullRows() compilation errors: - Use GetLongCardinality() instead of Cardinality() - Use AddRange(Range(0, total_rows - 1)) instead of AddRange(0, total_rows) - Add unit tests for btree file footer, index meta, writer, and indexer - Add integration test for btree global index
# Conflicts: # src/paimon/CMakeLists.txt # src/paimon/common/global_index/CMakeLists.txt # src/paimon/common/io/cache/cache_key.h # src/paimon/common/sst/block_cache.h
- Add B-tree compatibility test to ensure data compatibility with Java implementation - Implement B-tree global index writer with proper file format - Add integration tests for B-tree global index - Refactor SST block footer to sort lookup store footer - Update file index reader to support B-tree format - Add comprehensive test data for compatibility verification Co-Authored-By: Claude Opus <noreply@anthropic.com>
- Change BIGINT from string format to 8-byte little-endian binary format - Change TINYINT from string format to 1-byte binary format - Change SMALLINT from string format to 2-byte little-endian binary format - Update compatibility test data to match new binary format
8159962 to
319d889
Compare
92c2349 to
caafcca
Compare
- Add BTree Index configuration options (BTREE_INDEX_CACHE_SIZE, BTREE_INDEX_HIGH_PRIORITY_POOL_RATIO, etc.) - Fix SstFileWriter constructor parameter order - Update CacheManager to use options from configuration - Merge main branch changes for CacheManager integration
|
Impressive work on this complex feature. Thanks for the contribution — review coming up. |
1. Fix modernize-use-auto warnings - Replace explicit type declarations with auto when initializing with template casts - Fixed 13 warnings in btree_global_indexer.cpp - Fixed 5 warnings in btree_global_indexer_test.cpp 2. Fix AddressSanitizer alloc-dealloc-mismatch errors - Replace Bytes::AllocateBytes with std::make_shared<Bytes> - Avoid memory pool allocated objects being freed by operator delete - Fixed 13 memory allocation/deallocation mismatches 3. Fix UndefinedBehaviorSanitizer null pointer error - Add num_bytes > 0 check in MemorySegmentUtils::CopyToBytes - Avoid passing null pointer to memcpy when num_bytes is 0 4. Fix modernize-use-default-member-init warning - Use default member initializer for file_counter_ in btree_global_index_writer_test.cpp
a395732 to
e45da4c
Compare
| BlockFooter footer(index_block_handle, bloom_filter_handle); | ||
| auto slice = footer.WriteBlockFooter(pool_.get()); | ||
| Status SstFileWriter::WriteSlice(const MemorySlice& slice) { | ||
| auto data = slice.ReadStringView(); |
There was a problem hiding this comment.
What's the difference between WriteSlice(slice) and Write(slice)? Seems only one func is enough.
There was a problem hiding this comment.
There's no function named Write(slice).
| ASSERT_NE(deserialized, nullptr); | ||
|
|
||
| // Verify keys are null | ||
| EXPECT_EQ(deserialized->FirstKey(), nullptr); |
There was a problem hiding this comment.
Prefer ASSERT_* over EXPECT_* when the failure of a condition would make the rest of the test meaningless or lead to undefined behavior (e.g., null pointers, invalid setup).
|
|
||
| TEST_F(BTreeIndexMetaTest, SerializeDeserializeWithOnlyFirstKey) { | ||
| // Create a BTreeIndexMeta with only first_key (edge case) | ||
| auto first_key = std::make_shared<Bytes>("first", pool_.get()); |
There was a problem hiding this comment.
Is there any scenario in which this case could occur?
There was a problem hiding this comment.
Also SerializeDeserializeWithOnlyLastKey().
| class BTreeFileFooter { | ||
| public: | ||
| static Result<std::shared_ptr<BTreeFileFooter>> Read(MemorySliceInput& input); | ||
| static MemorySlice Write(const std::shared_ptr<BTreeFileFooter>& footer, MemoryPool* pool); |
There was a problem hiding this comment.
For out param, please use * rather than &.
| RoaringNavigableMap64 result; | ||
| result.AddRange(Range(0, total_rows - 1)); | ||
| result.AndNot(*null_bitmap_); | ||
| return result; |
There was a problem hiding this comment.
If row ids are not in order (keys are sorted), we can't rely on short-circuiting logic as all range is unknown.
# Conflicts: # include/paimon/defs.h # src/paimon/common/defs.cpp # src/paimon/common/io/cache/lru_cache.h
Purpose
Linked issue: close #38
Tests
Note: Google Test filter = BTree*
[==========] Running 47 tests from 6 test suites.
[----------] Global test environment set-up.
[----------] 7 tests from BTreeIndexMetaTest
[----------] 7 tests from BTreeFileFooterTest
[----------] 16 tests from BTreeGlobalIndexerTest
[----------] 7 tests from BTreeGlobalIndexWriterTest
[----------] 5 tests from BTreeGlobalIndexIntegrationTest
[----------] 5 tests from BTreeCompatibilityTest
[----------] Global test environment tear-down
[==========] 47 tests from 6 test suites ran. (26 ms total)
[ PASSED ] 47 tests.
API and Format
Documentation
Generative AI tooling
Generated-by: Claude Code